Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-initalize failed Kubernetes clusters #7234

Merged
merged 1 commit into from
Mar 26, 2020

Conversation

tstromberg
Copy link
Contributor

@tstromberg tstromberg commented Mar 25, 2020

If we fail to bring-up a Kubernetes cluster, this PR will reset it and bring it up again.

This helps primarily for two cases:

  • none driver initialization where untracked Kubernetes pods are running from a previous install.
  • clusters which have webhooks that prevent addons from being deployed.

Fixes #6425 and #6312 and numerous other errors which resemble:

	[ERROR Port-10251]: Port 10251 is in use
	[ERROR Port-10252]: Port 10252 is in use
	[ERROR Port-2380]: Port 2380 is in use

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 25, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tstromberg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 25, 2020
@medyagh
Copy link
Member

medyagh commented Mar 25, 2020

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Mar 25, 2020
@minikube-pr-bot
Copy link

Error: running mkcmp: exit status 1

@tstromberg
Copy link
Contributor Author

/ok-to-test

@@ -504,11 +529,32 @@ func (k *Bootstrapper) DeleteCluster(k8s config.KubernetesConfig) error {
cmd = fmt.Sprintf("%s reset", bsutil.InvokeKubeadm(k8s.KubernetesVersion))
}

if rr, err := k.c.RunCmd(exec.Command("/bin/bash", "-c", cmd)); err != nil {
return errors.Wrapf(err, "kubeadm reset: cmd: %q", rr.Command())
rr, derr := k.c.RunCmd(exec.Command("/bin/bash", "-c", cmd))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code seems similar to Kic and None Driver Stop()... I wonder if we could get rid of those (not sure) since we have it here.

given if minikube stop calls delete cluster....we could get rid of those?

@medyagh
Copy link
Member

medyagh commented Mar 25, 2020

for this PR, we still see the bind problems:
https://storage.googleapis.com/minikube-builds/logs/7234/442a786/Docker_Linux.html#fail_TestStartStop%2fgroup%2fcontainerd

helpers.go:200: TestStartStop/group/containerd logs: * Problems detected in kube-apiserver [6f8628b796d086bbad75355810a66403e4d2dc6f26b1d59a02e709e0ee15a3b7]:
- Error: failed to create listener: failed to listen on 0.0.0.0:8444: listen tcp 0.0.0.0:8444: bind: address already in use
* Problems detected in kubelet:
- Mar 25 21:52:52 containerd-20200

@tstromberg
Copy link
Contributor Author

for this PR, we still see the bind problems:
https://storage.googleapis.com/minikube-builds/logs/7234/442a786/Docker_Linux.html#fail_TestStartStop%2fgroup%2fcontainerd

helpers.go:200: TestStartStop/group/containerd logs: * Problems detected in kube-apiserver [6f8628b796d086bbad75355810a66403e4d2dc6f26b1d59a02e709e0ee15a3b7]:
- Error: failed to create listener: failed to listen on 0.0.0.0:8444: listen tcp 0.0.0.0:8444: bind: address already in use
* Problems detected in kubelet:
- Mar 25 21:52:52 containerd-20200

Correct, this does not prevent the error from happening, it only provides recovery from that error.

There isn't enough detail available to show why the dashboard failed to deploy.

@tstromberg tstromberg merged commit 095ccbe into kubernetes:master Mar 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

none driver: Port in use error during reuse
4 participants